Statistical Machine Translation of Serbian-English
نویسندگان
چکیده
In this work we present the first results of statistical approach to the machine translation of Serbian language into English and vice versa. The experiments are performed on the Assimil language course, bilingual parallel corpus which consists of about 3k sentences and 20k running words from unrestricted domain. The error rates for the translation of Serbian into English are about 35-45% and for the other direction about 45-55%. The results are comparable with those for the other language pairs having been translated using statistical approach. Reducing Serbian words into stems has decreased error rates for the translation into English for about 8% relative.
منابع مشابه
Towards the Use of Word Stems and Suffixes for Statistical Machine Translation
In this paper we present methods for improving the quality of translation from an inflected language into English by making use of part-of-speech tags and word stems and suffixes in the source language. Results for translations from Spanish and Catalan into English are presented on the LC-STAR trilingual corpus which consists of spontaneously spoken dialogues in the domain of travelling and app...
متن کاملAugmenting a Small Parallel Text with Morpho-syntactic Language Resources for Serbian-English Statistical Machine Translation
In this work, we examine the quality of several statistical machine translation systems constructed on a small amount of parallel Serbian-English text. The main bilingual parallel corpus consists of about 3k sentences and 20k running words from an unrestricted domain. The translation systems are built on the full corpus as well as on a reduced corpus containing only 200 parallel sentences. A sm...
متن کاملIdentifying main obstacles for statistical machine translation of morphologically rich South Slavic languages
The best way to improve a statistical machine translation system is to identify concrete problems causing translation errors and address them. Many of these problems are related to the characteristics of the involved languages and differences between them. This work explores the main obstacles for statistical machine translation systems involving two morphologically rich and under-resourced lan...
متن کاملExploring cross-language statistical machine translation for closely related South Slavic languages
This work investigates the use of crosslanguage resources for statistical machine translation (SMT) between English and two closely related South Slavic languages, namely Croatian and Serbian. The goal is to explore the effects of translating from and into one language using an SMT system trained on another. For translation into English, a loss due to cross-translation is about 13% of BLEU and ...
متن کاملEnlarging Scarce In-domain English-Croatian Corpus for SMT of MOOCs Using Serbian
Massive Open Online Courses have been growing rapidly in size and impact. Yet the language barrier constitutes a major growth impediment in reaching out all people and educating all citizens. A vast majority of educational material is available only in English, and state-of-the-art machine translation systems still have not been tailored for this peculiar genre. In addition, a mere collection o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004